22 research outputs found
Neural approaches to dialog modeling
Cette thèse par article se compose de quatre articles qui contribuent au domaine de l’apprentissage profond, en particulier dans la compréhension et l’apprentissage des ap- proches neuronales des systèmes de dialogue. Le premier article fait un pas vers la compréhension si les architectures de dialogue neuronal couramment utilisées capturent efficacement les informations présentes dans l’historique des conversations. Grâce à une série d’expériences de perturbation sur des ensembles de données de dialogue populaires, nous constatons que les architectures de dialogue neuronal couramment utilisées comme les modèles seq2seq récurrents et basés sur des transformateurs sont rarement sensibles à la plupart des perturbations du contexte d’entrée telles que les énoncés manquants ou réorganisés, les mots mélangés, etc.
Le deuxième article propose d’améliorer la qualité de génération de réponse dans les systèmes de dialogue de domaine ouvert en modélisant conjointement les énoncés avec les attributs de dialogue de chaque énoncé. Les attributs de dialogue d’un énoncé se réfèrent à des caractéristiques ou des aspects discrets associés à un énoncé comme les actes de dialogue, le sentiment, l’émotion, l’identité du locuteur, la personnalité du locuteur, etc.
Le troisième article présente un moyen simple et économique de collecter des ensembles de données à grande échelle pour modéliser des systèmes de dialogue orientés tâche. Cette approche évite l’exigence d’un schéma d’annotation d’arguments complexes. La version initiale de l’ensemble de données comprend 13 215 dialogues basés sur des tâches comprenant six domaines et environ 8 000 entités nommées uniques, presque 8 fois plus que l’ensemble de données MultiWOZ populaire.This thesis by article consists of four articles which contribute to the field of deep learning, specifically in understanding and learning neural approaches to dialog systems. The first article takes a step towards understanding if commonly used neural dialog architectures effectively capture the information present in the conversation history. Through a series of perturbation experiments on popular dialog datasets, wefindthatcommonly used neural dialog architectures like recurrent and transformer-based seq2seq models are rarely sensitive to most input context perturbations such as missing or reordering utterances, shuffling words, etc.
The second article introduces a simple and cost-effective way to collect large scale datasets for modeling task-oriented dialog systems. This approach avoids the requirement of a com-plex argument annotation schema. The initial release of the dataset includes 13,215 task-based dialogs comprising six domains and around 8k unique named entities, almost 8 times more than the popular MultiWOZ dataset.
The third article proposes to improve response generation quality in open domain dialog systems by jointly modeling the utterances with the dialog attributes of each utterance. Dialog attributes of an utterance refer to discrete features or aspects associated with an utterance like dialog-acts, sentiment, emotion, speaker identity, speaker personality, etc.
The final article introduces an embedding-free method to compute word representations on-the-fly. This approach significantly reduces the memory footprint which facilitates de-ployment in on-device (memory constraints) devices. Apart from being independent of the vocabulary size, we find this approach to be inherently resilient to common misspellings
AUTODIAL: Efficient Asynchronous Task-Oriented Dialogue Model
As large dialogue models become commonplace in practice, the problems
surrounding high compute requirements for training, inference and larger memory
footprint still persists. In this work, we present AUTODIAL, a multi-task
dialogue model that addresses the challenges of deploying dialogue model.
AUTODIAL utilizes parallel decoders to perform tasks such as dialogue act
prediction, domain prediction, intent prediction, and dialogue state tracking.
Using classification decoders over generative decoders allows AUTODIAL to
significantly reduce memory footprint and achieve faster inference times
compared to existing generative approach namely SimpleTOD. We demonstrate that
AUTODIAL provides 3-6x speedups during inference while having 11x fewer
parameters on three dialogue tasks compared to SimpleTOD. Our results show that
extending current dialogue models to have parallel decoders can be a viable
alternative for deploying them in resource-constrained environments
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models
Multi-task learning (MTL), instruction tuning, and prompting have recently
been shown to improve the generalizability of large language models to new
tasks. However, the benefits of such methods are less well-documented in
smaller language models, with some studies finding contradictory results. In
this work, we explore and isolate the effects of (i) model size, (ii) general
purpose MTL, (iii) in-domain MTL, (iv) instruction tuning, and (v) few-shot
fine-tuning for models with fewer than 500 million parameters. Our experiments
in the zero-shot setting demonstrate that models gain 31% relative improvement,
on average, from general purpose MTL, with an additional 37.6% relative gain
from in-domain MTL. Contradictory to prior works on large models, we find that
instruction tuning provides a modest 2% performance improvement for small
models
Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study
Neural generative models have been become increasingly popular when building
conversational agents. They offer flexibility, can be easily adapted to new
domains, and require minimal domain engineering. A common criticism of these
systems is that they seldom understand or use the available dialog history
effectively. In this paper, we take an empirical approach to understanding how
these models use the available dialog history by studying the sensitivity of
the models to artificially introduced unnatural changes or perturbations to
their context at test time. We experiment with 10 different types of
perturbations on 4 multi-turn dialog datasets and find that commonly used
neural dialog architectures like recurrent and transformer-based seq2seq models
are rarely sensitive to most perturbations such as missing or reordering
utterances, shuffling words, etc. Also, by open-sourcing our code, we believe
that it will serve as a useful diagnostic tool for evaluating dialog systems in
the future.Comment: To appear at ACL 2019(oral; nominated for best paper